{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 27.1 The basics of dynamic multithreading"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-1\n",
    "\n",
    "> Suppose that we spawn P-FIB$(n - 2)$ in line 4 of P-FIB, rather than calling it as is done in the code. What is the impact on the asymptotic work, span, and parallelism?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "No change."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-2\n",
    "\n",
    "> Draw the computation dag that results from executing P-FIB(5). Assuming that each strand in the computation takes unit time, what are the work, span, and parallelism of the computation? Show how to schedule the dag on 3 processors using greedy scheduling by labeling each strand with the time step in which it is executed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](./img/27.1-2_1.png)\n",
    "\n",
    "* Work: $T_1 = 29$.\n",
    "* Span: $T_\\infty = 9$.\n",
    "* Parallelism: $T_1 / T_\\infty \\approx 3.2$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-3\n",
    "\n",
    "> Prove that a greedy scheduler achieves the following time bound, which is slightly stronger than the bound proven in Theorem 27.1:\n",
    "\n",
    "> $\\displaystyle T_P \\le \\frac{T_1 - T_\\infty}{P} + T_\\infty$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$T_1 - T_\\infty$ is the number of  strands that are not belong to the longest path."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-4\n",
    "\n",
    "> Construct a computation dag for which one execution of a greedy scheduler can take nearly twice the time of another execution of a greedy scheduler on the same number of processors. Describe how the two executions would proceed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The critical path is twice the length of the other path."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-5\n",
    "\n",
    "> Professor Karan measures her deterministic multithreaded algorithm on $4$, $10$, and $64$ processors of an ideal parallel computer using a greedy scheduler. She claims that the three runs yielded $T_4 = 80$ seconds, $T_{10} = 42$ seconds, and $T_{64} = 10$ seconds. Argue that the professor is either lying or incompetent. (Hint: Use the work law (27.2), the span law (27.3), and inequality (27.5) from Exercise 27.1-3.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on span law:\n",
    "\n",
    "$T_\\infty \\le T_P = \\min\\{ 80, 42, 10 \\} = 10$\n",
    "\n",
    "Based on inequality (27.5):\n",
    "\n",
    "$$\\left \\{ \n",
    "\\begin{array}{rll}\n",
    "T_1 + 3T_\\infty &\\ge& 320 \\\\\n",
    "T_1 + 9T_\\infty &\\ge& 420\n",
    "\\end{array}\n",
    "\\right .$$\n",
    "\n",
    "$6 T_\\infty \\ge 100$, $T_\\infty \\ge 16$, which contradicts the span law."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-6\n",
    "\n",
    "> Give a multithreaded algorithm to multiply an $n \\times n$ matrix by an $n$-vector that achieves $\\Theta(n^2 / \\lg n)$ parallelism while maintaining $\\Theta(n^2)$ work."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "VEC-TIMES-VEC(a, b, l, r)\n",
    "1  if l == r\n",
    "2      return a[l] * b[r]\n",
    "2  m = floor((l + r) / 2)\n",
    "3  spawn sum_l = VEC-TIMES-VEC(a, b, l, m)\n",
    "4  spawn sum_r = VEC-TIMES-VEC(a, b, m + 1, r)\n",
    "5  sync\n",
    "6  return sum_l + sum_r\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The multiply of two vectors is thus $\\Theta(\\lg n)$, there are $n$ vectors to multiply simultaneously, and the outer parallel for is optimized to $\\Theta(\\lg n)$, therefore $T_\\infty = \\Theta(\\lg n) + \\Theta(\\lg n) = \\Theta(\\lg n)$, since $T_1 = \\Theta(n^2)$, then the parallelism $T_1 / T_\\infty = \\Theta(n^2 / \\lg n)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-7\n",
    "\n",
    "> Consider the following multithreaded pseudocode for transposing an $n \\times n$ matrix $A$ in place:\n",
    "> \n",
    "> ```\n",
    "P-TRANSPOSE(A)\n",
    "1  n = A.rows\n",
    "2  parallel for j = 2 to n\n",
    "3      parallel for i = 1 to j - 1\n",
    "4          exchange a_ij with a_ji\n",
    "```\n",
    ">\n",
    "> Analyze the work, span, and parallelism of this algorithm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Work: $T_1 = \\Theta(n^2)$.\n",
    "* Span: $T_\\infty = \\Theta(\\lg n) + \\Theta(\\lg n) + \\Theta(1) = \\Theta(\\lg n)$.\n",
    "* Parallelism: $T_1 / T_\\infty = \\Theta(n^2 / \\lg n)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-8\n",
    "\n",
    "> Suppose that we replace the__*parallel for*__ loop in line 3 of P-TRANSPOSE (see Exercise 27.1-7) with an ordinary __*for*__ loop. Analyze the work, span, and parallelism of the resulting algorithm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Work: $T_1 = \\Theta(n^2)$.\n",
    "* Span: $T_\\infty = \\Theta(\\lg n) + \\Theta(n) = \\Theta(n)$.\n",
    "* Parallelism: $T_1 / T_\\infty = \\Theta(n)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27.1-9\n",
    "\n",
    "> For how many processors do the two versions of the chess programs run equally fast, assuming that $T_P = T_1 / P + T_\\infty$?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\begin{array}{rll}\n",
    "T_1 / P + T_\\infty &=& T_1' / P + T_\\infty' \\\\\n",
    "2048 + P &=& 1024 + 8P \\\\\n",
    "P &=& 1024 / 7 \\\\\n",
    "&\\approx & 146\n",
    "\\end{array}\n",
    "$$"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
